-
Notifications
You must be signed in to change notification settings - Fork 2
LibBlosc2: New codec #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #54 +/- ##
===========================================
- Coverage 98.24% 86.55% -11.69%
===========================================
Files 5 4 -1
Lines 456 186 -270
===========================================
- Hits 448 161 -287
- Misses 8 25 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
If I understand correctly, the goal is to implement the HDF5 filter 32026 in https://github.com/silx-kit/hdf5plugin/blob/v5.1.0/src/PyTables/hdf5-blosc2/src/blosc2_filter.c According to my reading of https://github.com/Blosc/c-blosc2/blob/v2.17.1/README_EXTENSION_FILENAMES.rst, there are also .b2frame and .b2nd formats. Therefore, the format here should be called |
My goal is to implement a stand-alone blosc2 compressor/decompressor. I did not intend to connect it to HDF5, although that should be possible. The format I am implementing uses "super-chunks" which were introduced in blosc2. They allow compressing more than 2 GByte of data. Blosc2 still supports the compression methods used by blosc1 with their size limit. It would be possible to add support for this in |
The |
ping |
|
||
# There's more unused/unchecked data | ||
c[end-50] = 0x40 | ||
# BROKEN @test_throws Blosc2DecodingError decode(Blosc2DecodeOptions(), c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay for a format not to checksum everything.
# Finally, this corruption has an effect | ||
c[end-100] = 0x40 | ||
# Windows segfaults in this call with exit code 3221226356, | ||
# indicating a heap corruption. That's clearly a bug in c-blosc2. | ||
# It seems c-blosc2 does not checksum its compressed data. | ||
if !Sys.iswindows() | ||
@test_throws Blosc2DecodingError decode(Blosc2DecodeOptions(), c) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the file that is causing the segfault.
bad_file.txt
It would be good to see if this crashes https://github.com/Blosc/c-blosc2/blob/main/examples/decompress_file.c as well.
Until this is resolved, the documentation for this package should have warnings not to use the package with potentially invalid inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, decompress_file
does segfault on this input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a bug in blosc2 or the example. This issue isn't with checksums, it is probably blosc2 missing a bounds check somewhere. Can you report this upstream?
This is a first stab at implementing a Blosc2 codec. I believe the implementation is correct. I am looking for feedback.